Sentimiento y Reacciones en Twitter

José Manuel Magallanes Ph.D.

CRÉDITOS y CONDICIONES DE USO

Se ha vuelto de interés detectar el sentimiento de los mensajes en Twitter. Para nuestro caso, hemos usado el el clasificador propuesto por Elliot Hoffman para el caso especial de la lengua Española.

Como la data se está recolectando para el proceso 2018, presentaremos los Tweets en bloques del 15 de julio (fin del mundial de Futbol de Rusia) hasta el 31 de agosto; y del 1 de setiembre al 23 de setiembre (un día antes del debate). Los Tweets han sido filtrados, es decir, no aparecen los reTweets que un candidato haya hecho de otro usuario. Así mismo, cada Tweet ha sido depurado, eliminando direcciones web, y palabras con #. Cada sentimiento calculado para un Tweet, es acompañado de su aceptación (cantidad de me gusta), y de promoción o exposición (cantidad de retweeteo).

In [19]:
from classifier import *
clf = SentimentClassifier()

Los mensajes de Reggiardo

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [20]:
%matplotlib inline
import matplotlib.pyplot as plt

import json
import pandas as pd

candidatoFile='renzo_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  77
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [21]:
import json
import pandas as pd

candidatoFile='renzo_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  67

INICIO


Los mensajes de Belmont

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [22]:
candidatoFile='rbc_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  58
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [23]:
candidatoFile='rbc_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  8

INICIO


Los mensajes de Urresti

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [24]:
candidatoFile='urresti_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  849
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [25]:
candidatoFile='urresti_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  117

INICIO


Los mensajes de Lay

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [26]:
candidatoFile='lay_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  113

INICIO


Los mensajes de Capuñay

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [27]:
candidatoFile='EstherCapunay_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  205
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [28]:
candidatoFile='EstherCapunay_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  100

INICIO


Los mensajes de Beingolea

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [29]:
candidatoFile='BeingoleaA_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  223
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [30]:
candidatoFile='BeingoleaA_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  149

INICIO


Los mensajes de Castañeda

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [31]:
candidatoFile='son_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  50
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [32]:
candidatoFile='son_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  40

INICIO


Los mensajes de Jorge Muñoz

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [33]:
candidatoFile='JorgeMunozAP_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  259
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [34]:
candidatoFile='JorgeMunozAP_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  245

INICIO


Los mensajes de Jaime Salinas

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [35]:
candidatoFile='jaimesalinas80_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  34
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [36]:
candidatoFile='jaimesalinas80_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  52

INICIO


Los mensajes de Julio Gago

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [37]:
candidatoFile='JulioGagoPe_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  56
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [38]:
candidatoFile='JulioGagoPe_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  98

INICIO


Los mensajes de Enrique Cornejo

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [39]:
candidatoFile='ENRIQUECORNEJOR_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  244
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [40]:
candidatoFile='ENRIQUECORNEJOR_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  110

INICIO


Los mensajes de Enrique

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [41]:
candidatoFile='EnriquePorLima_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  12
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [42]:
candidatoFile='EnriquePorLima_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  8

INICIO


Los mensajes de Dietel Columbus

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [43]:
candidatoFile='DitelColumbus_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  245
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [44]:
candidatoFile='DitelColumbus_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  97

INICIO


Los mensajes de Gustavo Guerra García

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [45]:
candidatoFile='GGG_pe_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  258
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [46]:
candidatoFile='GGG_pe_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  319

INICIO


Los mensajes de Juan Carlos Zurek

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [47]:
candidatoFile='juancarloszurek_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  172
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [48]:
candidatoFile='juancarloszurek_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  127

INICIO


Los mensajes de Enrique Ocrospoma

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [49]:
candidatoFile='kikeocrospoma_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  167
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [50]:
candidatoFile='kikeocrospoma_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  26

INICIO


Los mensajes de Jorge Villacorta

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [51]:
candidatoFile='mikausape_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  109
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [52]:
candidatoFile='mikausape_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  119

INICIO


Los mensajes de Manuel Velarde

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [53]:
candidatoFile='ManuelVelardeD_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  104
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [54]:
candidatoFile='ManuelVelardeD_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  45

INICIO


Los mensajes de José Luis Gil

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [55]:
candidatoFile='JoseLuisGil1000_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  916
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [56]:
candidatoFile='JoseLuisGil1000_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  120

INICIO


Los mensajes de Gómez Baca

Los siguientes gráficos resumen las tres variables medidas por cada Tweet:

  • Tweets del 15 de julio al 31 de agosto:
In [57]:
candidatoFile='GomezBacaxLima_ago'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  326
  • Tweets del 1 de setiembre al 23 de setiembre: (un día antes del debate):
In [58]:
candidatoFile='GomezBacaxLima_predeb'
with open(candidatoFile, 'r') as fd:
    tweets=json.load(fd)
    
tweetText=[]
for tw in tweets:
    try:
        tweetText.append((tw['id'],tw['text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
    except:
        tweetText.append((tw['id'],tw['full_text'],tw['created_at'],tw['favorite_count'],tw['retweet_count']))
        
rows=[]
for idt,txt,date,likes,retws in tweetText:
    if not txt.startswith('RT'):
        txt = " ".join([word for word in txt.split()
                                    if 'http' not in word
                                        and not word.startswith('@')
                                        and not word.startswith('#')
                                        and word != 'RT'
                                    ])
        if not txt.startswith(('Te invito a darte','Te invito a ingresar')):
            senti=clf.predict(txt)
            rows.append([idt,likes,retws,senti,date,txt])
        


df=pd.DataFrame(rows,columns=["idT","Likes","ReTweets","Positividad","fecha","txt"])

df["fecha"]=pd.to_datetime(df["fecha"])

df.sort_values(by=['fecha'],inplace=True,ascending=True)
df.set_index('fecha',inplace=True)
df.to_csv(candidatoFile+'_data.csv',encoding='utf-8')

axes=df.drop(columns=['txt','idT']).plot(subplots=True,figsize=(18, 10),grid=True,sharex=True)
axes[2].axhline(y=0.5, color='r', linestyle='--')
plt.show()
print ("Cantidad de Tweets: ",df.shape[0])
Cantidad de Tweets:  45